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DETAILED ACTION 
Response to Amendment 

1 . The amendment filed on 8/10/2005 has been entered. The amendments to the 
claims overcomes and/or the arguments overcomes the 35 USC 101 and 1 12 
rejections. 

Response to Arguments 

2. Applicant's arguments filed 8/10/2005 have been fully considered but they are 
not persuasive. The arguments concerning the Anatani article and the Chun article have 
been considered, but they are deemed not to be persuasive. During the interview held 
on 8/9/2005 the examiner failed to discuss that these claims are open ended comprising 
claims. Thus, a reference that teaches the claim and more anticipates the claim. 

The Anatani article, Robust Extraction of Text in Video, teaches on page 833 
second column lines 11-12 that the detected text will be stationary which is the same as 
the newly claimed "static overlap since over many frames of video overlayed text is 
stationary or static. Thus, the arguments concerning the newly added "static overlay" 
limitation are not persuasive. Applicant further argues this article does not teach 
detecting a potential overlay and then further verifying the potential overlay is an actual 
overlay. At least the detection stage discussed in section 2 is detecting a potential 
overlay and at least the spatio-temporal stage or the tracking or the temporal processing 
of section 4 teaches determining if the potential text is actual text which actual text is 
overlay text due to the temporal processing used in verifying the text is an overlay text 
rather than a part of the moving background video. In order to overcome this reference 
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the detecting step and the verifying step need to be amended to distinguish applicants' 
detection and verification from the article's detection and verification. 

The Chun article, Text extraction in Videos using Topograhical Features of 
Characters, teaches on page 1 129 second column in section 5 "a new method to extract 
caption area at video image by topographical features of characters" which is detecting 
overlayed text in the video rather than detecting the scene text which will have different 
topographical features. It is seen in the article the captions in the video are static 
relative to the display screen. Thus, the arguments concerning the newly added "static 
overlay" limitation are not persuasive. Applicant further argues this article does not 
teach detecting a potential overlay and then further verifying the potential overlay is an 
actual overlay. At least section 3.1 teaches detecting a potential static overlay and at 
least section 3.2 teaches verify an actual static overlay by detecting a potential caption 
and then verifying the caption. In order to overcome this reference the detecting step 
and the verifying step need to be amended to distinguish applicants' detection and 
verification from the article's detection and verification. 

Allowable Subject Matter 

3. Claims 4-21 and 28 are objected to as being dependent upon a rejected base 
claim, but would be allowable if rewritten in independent form including all of the 
limitations of the base claim and any intervening claims. Claims 29-34 are allowable. 
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4. The following is a statement of reasons for the indication of allowable subject 
matter: 

Claims 4-21 and 28-34: 
The prior of record fails to teach or suggest detecting the potential overlay by using 
wavelet decomposition on the video sequence, extracting features based on the wavelet 
decomposition, and performing neural network processing on the extracted features. 

Claim Rejections - 35 USC § 102 

5. The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that 
form the basis for the rejections under this section made in this Office action: 

A person shall be entitled to a patent unless - 

(a) the invention was known or used by others in this country, or patented or described in a printed 
publication in this or a foreign country, before the invention thereof by the applicant for a patent. 

(b) the invention was patented or described in a printed publication in this or a foreign country or in public 
use or on sale in this country, more than one year prior to the date of application for patent in the United 
states. 

6. Claims 1-3, 22, 23, 26, and 27 are rejected under 35 U.S.C. 102(b) as being 
anticipated by Byung Tae Chun, Younglae Bae, Tai-Yun Kim, Text Extraction in Videos 
using Topographical Features of Characters, August 22-25, 1999, IEEE, vol. 2, pages 
1126-1130. 

This article teaches extracting text from video by two main steps of extracting 
candidate areas using topographical features and then verifying text is in those areas. 
Section 3.1 discusses extracting candidate area for text area and section 3.2 discusses 
verification of candidates of text area. Section 2 discusses character regions having 
some fixed colors and sizes, and are densely located in the horizontal direction, as 
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shown in Fig. 2. The colors and shapes are not regular in the background. Thus, text in 
the actual video will have movement and will likely have a size different then Chun's 
algorithm's text size. Therefore, Chun recognized the difference between original 
video with text and text overlayed onto the original video having original text and 
teaches to one of ordinary skill in the art to discriminate between the original text and 
the overlayed text . 

A detailed analysis of the claims follows. 
Claim 1: 

Chun teaches a method of video processing to be performed by video processing 
equipment, the method (See introduction.) comprising: 

extracting a pre-existing static overlay present in a video sequence (See the 
introduction, paragraph 1 which discusses text appearing in video such as news where 
is often used to identify people, see figure 2, and to place identifying marks, see the 
upper left and lower right corners of figure 2. See section 5 which discusses detecting 
captions which are stationary text.) said extracting comprising: 

detecting at least one potential overlay in the video sequence {Section 3.1 
discusses extracting candidate area for text area.)] and 

verifying that each at least one potential overlay is an actual static overlay that 
was previously added to an original video sequence to obtain said video sequence 
{Section 3.2 discusses verification of candidates of text area. Section 2 discusses 
character regions having some fixed colors and sizes, and are densely located in the 
horizontal direction, as shown in Fig. 2. The colors and shapes are not regular in the 
background. Thus, text in the actual video will have movement and will likely have a 
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size different then Chun's algorithm's text size. Therefore, Chun recognized the 
difference between original video with text and text overlayed onto the original video 
having original text and teaches to one of ordinary skill in the art to discriminate 
between the original text and the overlayed text . Captions are usually static. Also 
section 5 discusses "caption area" which implies a stationary area. Thus this article 
detects a static text overlay in the caption area.). 

Claim 2: 

Chun teaches the method of claim 1 , further comprising the step of post-processing at 
least one actual static overlay to remove extraneous pixels {Figure 1 shows the post 
processing step of removing noise. Noise comprises extraneous pixels such as non- 
character regions inside the character regions, see section 3.3, thus, Chun teaches 
removing extraneous pixels.). 

Claim 3: 

Chun teaches the method of claim 2, wherein said step of post-processing comprises 
the steps of: 

computing a variance for each pixel of the at least one actual static overlay {Section 3.3 
discusses removing noise by using Isodata color clustering. The verified actual overlay 
area is analyzed to determine the color of each pixel to cluster the pixels in the overlay 
area.)] and 

comparing the variance with a threshold to determine whether or not the pixel should be 
removed as an extraneous pixel {The size of the color clusters are compared and if they 
are too small the cluster is removed which removes the pixels forming each cluster.). 
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Claim 22: 

Chun teaches the method of Claim 1 , wherein said step of detecting comprises the step 
of: 

performing template matching to determine the presence of a potential overlay {Section 
2 and 3. 1 discusses using the topological features of characters to determine the 
presence of a potential overlay. Topological features of characters define a template for 
each character or groups of characters.). 

Claim 23: 

Chun teaches the method of claim 22, wherein said step of detecting further comprises 
the step of: 

determining a template {The paragraph before section 3 discusses determine n and 
alpha. The values ofn and alpha form a template.) to be used in said step of 
performing template matching. 

Claim 26: 

Chun teaches a computer readable medium containing computer-executable code for 
causing a computer to implement the method of claim 1 {Chun discusses using a 
computer to perform the text extraction in section 4. The discussed Pentium 4 
computer using a program written in Visual C++ Ver. 5.0 has the program stored in a 
computer readable medium, the disk drive and RAM.). 
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Claim 27: 

Chun teaches a computer system comprising: 

a computer {Chun discusses using a Pentium 4 computer to perform the text extraction 
in section 4.); and 

a computer readable medium coupled to said computer and containing computer- 
executable code for causing a computer to implement the method of claim 1 {Chun 
discusses using a computer to perform the text extraction in section 4. The discussed 
Pentium 4 computer using a program written in Visual C++ Ver 5.0 has the program 
stored in a computer readable medium, the disk drive and fy\M). 



7. Claims 1, 22-27, and 35-38 are rejected under 35 U.S.C. 102(a) as being 
anticipated by S. Antani, D. Crandall, R. Kasturi, Robust Extraction of Text in Video, 
Sept 3-7, 2000, IEEE, vol. 1, pages 931-834. 

This article teaches detecting static overlays on video by performing a frame to 
frame comparison of the video. In the section 3, second paragraph at lines 7-1 1 
"artificial caption text" and "scene text occurring naturally in a video frame" is discussed. 

The Antani article discusses the video having temporal information while the 
overlayed characters have less temporal information and the overlayed characters are 
contrasted by a changing background. Abstract and section 4. Text in the original video 
will more likely have movement from frame to frame. Applicant's arguments made 
reference to a stop sign example would most likely be part of a moving background 
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while text overlayed onto the video will most likely be stationary. As discussed above 
Anatani on page 833 second column lines 11-12 states the detected text will be 
stationary. Therefore, Antani recognized the difference between original video with text 
and text overlayed onto the original video and teaches to one of ordinary skill in the art 
to discriminate between the original text and the overlayed text. 

A detailed analysis of the claims follows. 
Claim 1: 

Antani teaches a method of video processing to be performed by video 
processing equipment, the method (See introduction.) comprising: 

extracting a pre-existing static overlay present in a video sequence (See the 
introduction, paragraph 1 second column which discusses text appearing in video.) said 
extracting comprising; 

detecting at least one potential overlay in the video sequence {Section 2 
discusses three stages, the detection, localization, and segmentation stages. The 
detection stage detects a potential overlay.)] and 

verifying that each at least one potential overlay is an actual static overlay that 
was previously added to an original video sequence to obtain said video sequence 
(Section 2 discusses the localization stage which uses many methods to localize the 
text. Section 2 discusses using many different localization algorithms whose outputs are 
fused in the spatio-temporal decision fusion module over multiple frames to verify that a 
potential text is text. Section 2 also discusses using a tracking stage, this would 
inherently verify the potential text is an actual text. The Abstract and section 4 
discusses the video having temporal information while the overlayed characters have 
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less temporal information and the overlayed characters are contrasted by a changing 
background. Text in the original video will more likely have movement from frame to 
frame. Applicant's arguments made reference to a stop sign example would most likely 
be part of a moving background while text overlayed onto the video will most likely be 
stationary. Therefore, Antani recognized the difference between original video with text 
and text overlayed onto the original video and teaches to one of ordinary skill in the art 
to discriminate between the original text and the overlayed text.). 

Claim 22: 

Antani teaches the method of Claim 1 , wherein said step of detecting comprises the 
step of: 

performing template matching to determine the presence of a potential overlay {Section 
2 discusses the detection of potential overlay in the detection stage which consists of 
many different localization algorithms whose outputs are fused in the spatio-temporal 
decision fusion module over multiple frames. In order to determine if text exists then 
predefined knowledge of the text is compared with the current image to determine if a 
match exists. Predefined knowledge of the text is a template.). 

Claim 23: 

Antani teaches the method of claim 22, wherein said step of detecting further comprises 
the step of: 

determining a template to be used in said step of performing template matching 
{Inherently at some time the templates used by the program were determined.). 
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Claim 24: 

Antani teaches the method of claim 22, wherein said step of verifying comprises the 
steps of: 

performing frame-to-frame correlation of said potential overlay {Section 2 
discusses using many different localization algon'thms wtiose outputs are fused in the 
the spatio-temporal decision fusion module over multiple frames.): and 

comparing a result of the frame-to-frame correlation with a threshold to determine 
if the potential overlay is an actual static overlay or not {In order to determine if text 
exists then predefined knowledge of the text is compared with the current image to ^ 
determine if a match exists. Predefined knowledge of the text is a template of 
thresholds.). 

Claim 25: 

Antani teaches the method of claim 24, wherein said step of performing frame-to-frame 
correlation (See the discussion above for claim 24.) comprises the steps of: 

forming a mean square error over a set of frames from said video sequence, 
averaged over all of the pixels in said potential overlay {This claim does not claim a use 
for the mean square error, thus, a reference that forms a mean square error over a set 
of frames teaches the claim. This claim does not claim how the mean square error is 
formed, thus, a reference that inherently forms the error teaches the claim. The 
specification in paragraph 0039 sets forth a specific formula for calculating the mean 
square error, however, the claim only broadly claims how the claimed mean square 
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error is calculated. Although the claims are interpreted in light of the specification, 
limitations from the specification are not read into the claims. See In re Van Geuns, 988 
F,2d 1 181, 26 USPQ2d 1057 (Fed. Cir. 1993). The disclosed formula determines the 
average difference in intensities between a current frame and a subsequent frame. 
Antani inherently forms the mean square error since Antani in the localization stage 
fuses over several frames decisions from many localization algorithms which inherently 
has determined the average difference in intensities between frames in order to 
determine if text exists.). 

Claims 26 and 27: 

Inherently the algorithm of Antani is performed by a computer having a computer 
readable medium containing computer-executable code for causing a computer to 
implement the claimed steps. 

Claim 35: 

Antani teaches a method of processing video to be performed by video processing 
equipment, the method, comprising: 

extracting a pre-existing static graphical {In the section 3 second paragraph lines 7-11 
"artificial caption text" and "scene text occurring naturally in a video frame" is discussed. 
Artificial caption text is graphical because graphical includes many objects including 
text.) overlay present in a video sequence, said extracting comprising: 
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detecting at least one potential overlay in a the video sequence {Section 2 discusses 
three stages, the detection, localization, and segmentation stages. The detection stage 
detects a potential overlay.), said detecting comprising the step of: 
performing template matching {Section 2 discusses the detection of potential overlay in 
the detection stage which consists of many different localization algorithms whose 
outputs are fused in the spatio-temporal decision fusion module over multiple frames. 
In order to determine if text exists then predefined knowledge of the text is compared 
with the current image to determine if a match exists. Predefined knowledge of the text 
is a template,): and 

verifying that each at least one potential overlay is an actual static overlay that was 
previously added to an original video sequence to obtain said video sequence {Section 
2 discusses the localization stage which uses many methods to localize the text. 
Section 2 discusses using many different localization algorithms whose outputs are 
fused in the the spatio-temporal decision fusion module over multiple frames to verify 
that a potential text is text. Section 2 also discusses using a tracking stage, this would 
inherently verify the potential text is an actual text. The Abstract and section 4 
discusses the video having temporal information while the overlayed characters have 
less temporal information and the overlayed characters are contrasted by a changing 
background. Text in the original video will more likely have movement from frame to 
frame. Applicant's arguments made reference to a stop sign example would most likely 
be part of a moving background while text overlayed onto the video will most likely be 
stationary. Therefore, Antani recognized the difference between original video with text 
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and text overlayed onto the original video and teaches to one of ordinary skill in the art 
to discriminate between the original text and the overlayed text.), said verifying 
connprising the step of: 

performing frame-to-frame correlation of a potential overlay determined by said 
detecting step {Section 2 discusses using many different localization algorithms whose 
outputs are fused in the spatio-temporal decision fusion module over multiple frames,). 

Claim 36: 

Antani teaches the method of Claim 35, wherein said step of detecting further 
comprises the step of: 

determining a template to be used in said step of performing template matching 
{Inherently at some time the templates used by the program were determined.). 

Claim 37: 

Antani teaches the method of Claim 36, wherein said step of determining a template 
comprises the step of: 

performing addition or frame-by-frame subtraction of video frames {This claim does not 
define the specifics of the addition of video frames or the frame-by-frame subtraction of 
video frames. This claim does not state if pixel values are added or frame numbers are 
added or if as in Antani the results of many frame analyses are fused or added or 
subtracted.). This step does not state what function the addition or subtraction 
performs, thus, the scope of the claim is broad and is met by Antani when a template for 
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detection stage is determined since the claim does not claim when the template is 
determined and when the addition or subtraction is performed. Therefore in this 
comprising claim all that is needed is for the reference to teach the claimed steps. 
Claim 38: 

Antani teaches the method of Claim 36, wherein said step of determining a template 
comprises the steps of: ^ 

segmenting video frames into foreground and background objects {Text is foreground 
and video is tlie background. See tlie Abstract at the next to last and last sentences. 
Section 1 second paragraph lines 8-9.); 

performing correlation tracking to determine if any foreground object remains in the 
same absolute location in each video frame {Section 2 discusses using many different 
localization algorithms whose outputs are fused in the spatio-temporal decision fusion 
module over multiple frames to verify that a potential text is text. In the last sentence of 
section 2 the article teaches due to the fact that text lasts over several frames the text 
may be determined. The Abstract at the last sentence teaches determining if the text is 
static). This step does not state what function the segmenting and correlation tracking 
performs, thus, the scope of the claim is broad and the claim does not claim when the 
template is determined and when the segmenting and correlation tracking is performed. 
Therefore in this comprising claim all that is needed is for the reference to teach the 
claimed steps. 
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Conclusion 

8. Any amendments to the clainns to overcome the Anatani article and the Chun 
article should also be compared with the previously cited Sato article, Video OCR for 
Digital News Archives, and the Jeong article, Neural Network-Based Text Location for 
News Video Indexing which also determine static overlays in video. 

9. THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time 
policy as set forth in 37 CFR 1 .1 36(a). 

A shortened statutory period for reply to this final action is set to expire THREE 
MONTHS from the mailing date of this action. In the event a first reply is filed within 
TWO MONTHS of the mailing date of this final action and the advisory action is not 
mailed until after the end of the THREE-MONTH shortened statutory period, then the 
shortened statutory period will expire on the date the advisory action is mailed, and any 
extension fee pursuant to 37 CFR 1 .136(a) will be calculated from the mailing date of 
the advisory action. In no event, however, will the statutory period for reply expire later 
than SIX MONTHS from the mailing date of this final action. 

1 0. Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to Jeffery A Brier whose telephone number is (571 ) 272- 
7656. The examiner can normally be reached on M-F from 7:00 to 3:30. If attempts to 
reach the examiner by telephone are unsuccessful, the examiner's supervisor, Michael 
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Razavi, can be reached at (571) 272-7664. The fax phone Number for the organization 
where this application or proceeding is assigned is 571-273-8300. 

Information regarding the status of an application may be obtained from the 
Patent Application Information Retrieval (PAIR) system. Status information for 
published applications may be obtained from either Private PAIR or Public PAIR. 
Status information for unpublished applications is available through Private PAIR only. 
For more information about the PAIR system, see http://pair-direct.uspto.gov. Should 
you have questions on access to the Private PAIR system, contact the Electronic 
Business Center (EBC) at 866-217-9197 (toll-free). 
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